DOCUMENT RESUME 



ED 385 593 



TM 024 041 



AUTHOR 
TITLE 

INSTITUTION 
REPORT NO 
PUB DATE 
NOTE 
PUB TYPE 
JOURNAL CIT 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Powers, Donald E. 

Coaching for the SAT: A Summary of the Summaries and 
an Update. [Reprint.] 

Educational Testing Service, Princeton, N.J. 

ETS-RR-93-32 

93 

9p. 

Information Analyses (070) — Journal Articles (080) 
Educational Measurement: Issues and Practice; 
p24-30,39 Sum 1993 

MFOl/PCOl Plus Postage. 

'^'College Entrance Examinations; *Cost Effectiveness; 
Higher Education; High Schools; ^Mathematics Tests; 
Meta Analysis; *Test Coaching; *Time Factors 
(Learning) 

^Scholastic Aptitude Test 



ABSTRACT 

Several available summaries of research on coaching 
for the Scholastic Aptitude Test (SAT) are summarized and their 
principal findings discussed. Some additional studies, that have been 
completed since these summaries were reported, are considered and 
linked to the summaries. The four major meta-analyses considered are 
those of: (1) Messick and Jungeblut, 1981; (2) DerSimonian and Laird, 
1983; (3) Kulik, Bangert-Drowns , and Kulik, 1984; and (4) Becker, 
1990. Taken together, these studies indicate that the effects of 
coaching, special test preparation, are somewhat greater for the more 
curriculum-related mathematics section of the SAT than the verbal 
section. Longer coaching programs tend to yield somewhat greater 
effects, but simply doubling the effort does not double the effect. 
It is also apparent that the estimation of coaching effects depends 
on the degree to which spurious effects are controlled (e.g., 
regression, self-selection, noncomparable scores, differential 
motivation). In general, recent studies are consistent with the 
meta-analytic summaries. Those who seek coaching for the SAT should 
consider not only expected benefits, but also the cost in terms of 
time and money. Two tables summarize study findings. (Contains 30 
references.) (SLD) 



>V Vc >'c it ■)': it I't ■>': i( Vc i( t'( >V i( i( i( i( ■>'( i( i( •!( i; V: i( i( i; i( i( i; ■>'( i( i( i( ■>'( i( i( ic i( i( i( i( V: i( i( i; i( i( i; i( >V i( i( i( i( ■>'( i( i( ■>'( i( i( i( i( i( i( i( i( i( i( i 

Reproductions supplied by EDRS are the best that can be made ' 
from tiie original document. ' 

i( i( i( ■)'( i( i( i( it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it ■> 



Reprinted with Permission 



ETS RR-93-32 



Coaching for the SAT: A 
Summary of the Summaries and 



an Update 

Donald E. Powers 
Educational Testing Service 



U.S. OCMRTWEKT Of EDUCATION 
Otttca o4 Educ«t>onai Rasaarch ano impfovam«ni 

EDUCATIONAL RESOURCES INFORMATION 
V CENTER (ERIC) 

p^his document has Men reproduceO as 
received from the person or ofgani/ation 
originatir>g it 

C. Minor changes have been made to improve 
reproduction qutlity 

e Points of view or opinions stated in this docu- 
ment do rwl rtecesurily represent oHicial 
OERl position or policy 



Why is it inyKtrtant to investigate the effectiveness of 
coaching for a test such as the SAT? How can you evalu- 
ate the effectiveness of coaching? What are some com- 
mon misconceptions? What do we know and what is still 
unclear about the effects of coaching? What can we tell 
students and their families about the results of coaching 
studies? 



Performance Oh tests such as 
the Scholastic Aptitude Test 
(SAT) and the American Collie 
Testing (ACT) Program Assess- 
ments assumes at least some degree 
of importance to the several million 
college-bound students who take 
these tests each year. It is perh£q)s 
not surprising that because of the 
high-stiJces nature of these examina- 
tions, parents, schools, and students 
are interested in maximizing perfor- 
mance on these assessments. In 
response to this attention, numer- 
ous secondary schools, commercial 
firms, and private entrepreneurs 
have developed a wide variety of 
special preparations for these tests, 
primarily for the SAT. Nowadays, 
there is what can be best described 
as a bewildering array of test prepa- 
ration resources to help students 
prepare for the SAT (e.g. Powers, 
1988). Test preparation books, soft- 
ware, and coaching courses of vari- 
ous sorts abounrl 

This article hai> two major aims. 
The first is to describe briefly several 
available summaries of coaching for 
the SAT and to discuss the principal 
findings of these summaries. The 
second is to consider several addi- 
tional studies that have been com- 
pleted since these summaries were 

24 



reported and to relate the new 
findings to the summaries. At issue 
is whether or not special preparation 
(in particular, that which can be 
provided over a relatively short term) 
can, beyond the effects resulting 
from regular schooling, have a signif- 
icant impact on test scores. The 
topic is important for several rea- 
sons. First, if extra preparation is 
effective but not reasonably avail- 
able to all test takers, then some test 
takers may have an unfair advan- 
tage over others. Second, if short- 
term preparation that is geared 
mainly toward test-taking tricks is 
effective, then the interpretation of 
test scores as indicators of general 
academic ability (instead of simply 
the ability to take tests) is called into 
question. Third, because test prepa- 
ration can be both costly and time- 
consuming, it may detract from 
students' participation in other 
worthwhile academic activities. 



Misconceptions 

The rate at which new test prepara- 
tion resources are developed and 
marketed appears far greater than 
the rate at which adequate informa- 
tion is generated about the effective- 
ness of these offerings. Indeed, any 
comprehensive evaluation of extant 



••PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIO/^ 

test preparation resources seems 
unlikely. If the claims of the coach- 
ing enterprises have any validity, the 
most effective of the lot are the 
courses given by commercial coach- 
ing companies, which often promise, 
or even "guarantee," score improve- 
ments of 100 points or more.^ These 
claims, however, appear to be based 
at least partly on false premises, 
partial information, and selective 
reporting. More will be said about 
these shortcomings later. 

Consumers apparently are more 
easily convinced about the effective- 
ness of special test preparation than 
are others who research the topic. 
The most common misconception is 
that simple test-score gains from 
one occasion to another are an 
adequate reflection of the effects of 
coaching. Threats to the validity of 
this interpretation are readily appar- 
ent to experiment designers and 
measurement specialists. Because of 
practice with taking tests, measure- 
ment error, and real growth in 
abilities, an individual's test scores 
will vary from one test administra- 
tion to another, regardless of any 
intervening test preparation. Quite 
predictably, some examinees will 
register large score increases upon 
retesting, and others will exhibit 
large decreases. Oftentimes, how-, 
ever, the only evidence needed to 
sway test takers is a lai^e score 
increase by a sin^e fellow stu- 
dent — a simple proof of concept. TTie 
problem is exacerbated when these 



Donald E. Powers is a Senior Re- 
search Scientist at the Educaiional Test- 
ing Service, Mail Stop 17-R, Princeton, 
NJ 08541. His specialization is educa- 
tional measurement. 



Educational Measurement: Issues and Practice 



BEST COPY AVA1UBLE 



mislabeled "effects" are reported 
selectively by coaching enterprises 
or by their cUents. Coaching schools 
are more likely to publicize score 
increases than decreases, and big 
score gainers are more inclined to 
proclaim their success than score 
losers or "no changers" are to 
declare their lack of it. 



Components of Test Score 
Gains 

Even if they are fully reported, 
however, simple test score gains do 
not constitute a valid measure of 
effectiveness. As stated above, gains 
may reflect any and all of the 
following: test practice (i.e., simply 
having taken the test before), growth 
in the abilities measured by the test, 
and measurement error. Although 
the effect of test practice, uncon- 
founded with other factors, is diffi- 
cult to assess, it appears that simply 
repeating the SAT may improve test 
scores, perhaps by about 16 points 
on the 200-800 point verbal scale of 
the test and about 12 points on the 
math 200-800 point portion. These 
estimates, which are based on all 
students who took the SAT as 
juniors in the spring of 1990 and 
again as seniors in the fall of the 
same year (College Board, 1991), 
undoubtedly reflect some growth 
also. . 

With respect to growth, the rate 
at which the SAT verbal and quanti- 
tative reasoning skills develop has 
not been studied ^stematically, and 
it is likely that students develop 
these abilities at different rates. 
Some information is available, how- 
ever. For example, in a longitudinal 
study of young, highly able students. 
Burton (Wilder, Casserly, & Burton, 
1988) demonstrated a yearly aver- 
age improvement of about 50 points 
on the verbal and mathematical 
portions of the SAT for students 
tested repeatedly between the ages 
of 13 and 17 years. It seems likely 
that these relatively consistent im- 
provements resulted mainly from 
growth in abilities, because test 
practice effects should decrease over 
time. Growth may be less dramatic 
for the typical SAT taker than for 
these extremely able students. It is, 
however, a significant component of 
SAT score gains for all test takers 
and one that must be accounted for 



when estimating the effects of spe- 
cial test preparation. 

Although growth and practice do 
increase test scores, measurement 
error, on the other hand, may con- 
tribute to either increases or de- 
creases upon retesting. Typically, 
about 1 in 25 SAT takers will gain 
100 or more total points, and about 1 
in 110 will lose 100 or more points 
on retesting. Predictably, these 
changes will depend on students' 
initial scores, with low-scoring exam- 
inees more likely to register the 
largest gains and hi^-scoring test 
takers the greatest losses. For in- 
stance, junior-year test takers who 
score about 500 on SAT-V will, as 
seniors, average about 507. Those 
juniors who score about 500 on 
SAT-M will also average about 507 
on this scale when they retest as 
seniors. In contrast, juniors who 
score at the 300 level upon retesting 
the next year will average about 331 
on SAT-V and 342 on SAT-M (Col- 
lie Board, 1991). Johnson, Adiviry, 
Wallace, Robinson, and Vaughn 
(1985) provided a go<)d discussion of 
the effects of measurement error 
(specifically, r^ession to the mean) 
in the context of an SAT coaching 
study sponsored by the NAACP. The 
gains made by coached students 
varied markedly according to their 
initial test scores, respectively. Stu- 
dents whose initial scores were be- 
low 300 gained 41 verbal score 
points and 75 math points. For those 
scoring between 300 and 400 ini- 
tially, gains were 30 and 19 points on 
the verbal and math scales, respec- 
tively. Test takers starting above 
400 gained 23 points on the verbal 
scale but lost 5 points on the math. 
The authors properly noted the role 
of this regression effect in their 
estimates of the effects of the coach- 
ing program.^ 



The Evidence 

Although test takers and their par- 
ents may be inclined to rely on 
word-of-mouth reports from previ- 
ously coached students, good science 
demands more than anecdotal evi- 
dence — at a minimum, some compar- 
ison of coached examinees with 
uncoached ones,^ many of whom 
will, for the reasons noted earlier, 
also register test score gains despite 
the lack of any special test prepara- 



tion. Over the years, many individ- 
ual studies of the effects of special 
preparation or coaching for the SAT 
have been conducted. These studies 
have differed net only in their meth- 
ods but also in their results. The 
availability of these apparently con- 
flicting results has made it possible 
to cite individual studies in support 
of claims that coaching for the SAT 
can be very effective or, on the other 
hand, that it does not work at all. 

Meta-Analysis as a Tool 
Fortvmately, during the early 1980s 
a powerful analytical technique 
called "meta-analysis" was devel- 
oped (Glass, McGaw, Smith, 1981; 
Hedges & Olkin, 1985; Hunter, 
Schmidt, & Jackson, 1982; Light & 
Pillemer, 1984; Rosenthal, 1984). 
This procedure has been heralded as 
one of the most significant develop- 
ments in social science research 
methodology in recent years. Its 
strength is that it enables the inte- 
gration of individual, possibly con- 
flicting, studies conducted by 
different researchers vmder varying 
conditions. When considered sepa- 
rately, such studies often are based 
on samples that are too small, too 
limited in scope, or flawed in ways 
that preclude uneqmvocal conclu- 
sions. CJollectively, however} these 
individiud studies can be iiiforma- 
tive. 

Recently, meta-analysis has been 
applied to studying the effects of 
special preparation for standardized 
tests, including the SAT. In fact, 
several summaries have now been 
reported— by ETS researchers Mes- 
sidc and Jimgeblut (1981) and by 
university<affiUated investigators at 
Harvard (DerSimonian & Laird, 
1983), Michigan (Kulik, Bangert- 
Drowns, & Kulik, 1984), and Michi- 
gan State (Becker, 1990). Each of 
these efforts is reviewed in subse- 
quent paragraphs. 

A Definition of Coaching 
Before undertakingthis review, how- 
ever, it will be useful to consider 
more precisely what is meant by 
special preparation or coaching, for 
as Messick ( 1982) has suggested, the 
controversy over coaching has been 
stoked at least partly by different 
definitions of these terms. No resolu- 
tion of these differences will be 
attempted here. It may, however, be 



O . Summer 1993 



25 



ERIC 



useful to mention some of the salient 
dimensions on which special prepara- 
tion may vary. 

Anastasi ( i981 j, Bond a989j, Cole 
a982j, and Messick (1982j have 
each discussed various meanings 
and implications of test preparation 
and coaching. Special test prepara- 
tion can vary according to objectives, 
duration, and methods. It may be 
designed to affect scores indirectly 
by increasing confidence or decreas- 
ing anxiety, or to raise scores more 
directly by teaching specific skills, 
strategies, or even "tricks." It can 
entail short-term cramming or long- 
term instruction. It can involve 
orientation to general test taking, 
familiarization for a particular test, 
review of relevant subject matter, 
drill-and-practice on sample test 
questions, or development of aca- 
demic skills and competencies. When 
evaluating the effects of coaching it 
is necessary, therefore, to describe in 
some detail the germane characteris- 
tics of the preparation that is being 
evaluated. Unfortunately, however, 
evaluations do not always describe 
program characteristics well. 

Findings From Four Meta-Analyses 
What then is the answer to the 
question, "What is the effect of 
coaching for the SAT?" The meta- 
analytic summaries are useful in 
considering this query. For the 
"typical" program, the efFect is about 
15-25 points each on the verbal and 
on the mathematical portions of the 
SAT. A more precise answer, how- 
ever, is a more qualified one, and the 
various meta-analyses now make it 
possible to explain some of the 
factors on which the results of 
individual studies seem to depend. 

Objectives. First, however, a word 
is in order about the nature and 
intent of the various meta-analyses. 
In the earliest available quantitative 
summaiy, Messick and Jungeblut 
(1981) reviewed all of the available 
studies of coaching for the SAT 
regardless of the way in which the 
term "coaching" was defined. Focus- 
ing on school-based and proprietary 
programs, these researchers asked, 
"How much student time devoted to 
what kinds of coaching experiences 
yield what level of score improve- 
ments in comparison with the level 
of experiential growth that would 
have occurred anyway without these 



26 



coachingexperiences?" (p. 192j.With 
regard to the methods used by 
individual investigators, the authors 
noted that all of the sfiudies they 
reviewed were "methodologically 
flawed in various and divergent 
ways." 

Next, DerSimonian and Laird 
(1983) analyzed all of the studies 
considered by Messick and Junge- 
blut (1981) as well as those reviewed 
in a narrative summaiy by Slack and 
Porter (1980). One of their inten- 
tions was to determine the extent to 
which the individual study estimates 
included in earlier summaries repre- 
sent real variation among situa- 
tions. 

Kulik, Bangert-Drowns, and Ku- 
lik (1984) considered 38 studies of 
the SAT or other aptitude tests. 
These authors included only studies 
that involved "a trae test-coaching 
program, not a program of practice 
or tutoring" (p. i80). By this, the 
reviewer-s meant that students were 
explicitly instructed in test-taking 
strategies, not merely allowed to 
practice on tests, and thereby infer 
effective strategies on their own. 
Studies that focused mainly on im- 
proving specific academic sldlls (oth- 
er than verbal and quantitative 
reasoning) were not reviewed. In 
their analyses, these authors consid- 
ered the features of the coaching 
programs, the methodological char- 
acteristics of the studies, and the 
attributes of the students involved. 

In the most recent and most 
comprehensive review, Becker ( 1990) 
analyzed a total of 48 studies either 
taken from earlier meta-analyses or 
completed after these summaries 
were reported. Becker used an alter- 
native measure of the effect of 
coaching that allowed the inclusion 
of all studies employing pretest- 
posttest comparisons, regardless of 
whether the studies incorporated a 
comparison group. Becker consid- 
ered a number of factors simulta- 
neously and asked about the relative 
contribution to coaching effect esti- 
mates of student characteristics, 
coaching interventions, and study 
design. She also asked whether or 
not coaching effects were different 
for the verbal and math portions of 
the SAT. 

Principal Findings. Briefly then, 
what are the major revelations of 
these summaries? First, the effects 



of coaching are somewhat greater 
for the more curriculum-related 
mathematics section of the SAT 
than for the verbal section (Becker. 
1990; Messick &, Jungeblut, 1981 k 
Also, as one might expect, longer 
coaching programs yield somewhat 
greater effects than do shorter ones. 
However, simply doubling the effort, 
for example, does not double the 
effect. Diminishing returns set in 
rather quickly, and the time needed 
to achieve average score increases 
that are much larger than the rela- 
tively small increases observed in 
typical programs rapidly approaches 
that of full-time schooling (Messick 
& Jungeblut, 1981). Becker (1990; 
also documented the relationship 
between duration of coaching and 
effects on SAT scores, noting a 
weaker association after controlling 
for differences in the kind of coach- 
ing and the study design. 

Another important conclusion 
from these summaries is that the 
estimation of coaching effects de 
pends heavily on the degree to which 
spurious effects are controlled (e.g., 
regression, self-selection, noncompa- 
rable scores, differential motiva- 
tion). Studies that merely compare 
test score gains of coached students 
with national norms yield "coaching 
effects" that are about 4 to 5 times 
greater (and much less consistent) 
than effects estimated from studies 
that employ more scientifically rigor- 
ous designs (DerSimonian & Laird, 
1983). Thus, if greater confidence is 
placed on the more rigorous studies, 
then the tjypical effect is less than 
15-25 points on each (verbal and 
mathematical) portion of the SAT. 

Using somewhat different analyti- 
cal procedures than those used by 
DerSimonian and Laird (1983), 
Becker (1990) noted severe con- 
founding between the characteris- 
tics of coaching studies, thus 
thwarting her attempt to fit a model 
that would explain variation in study 
results across a wide array of investi- 
gations. However, a simple model 
did explain differences among pub- 
lished studies that employed compar- 
ison groups. Becker concluded that 
if these comparison-group studies 
can be taken as the most rigorous 
evaluations of the effect of coaching, 
then "we must expect only mo- 
dest gains from any coaching 
intervention" (p. 405)--cn average. 



Educational Measurement: Issues and Practice 



about 9 points for SAT-V and 19 
points for SAT-M. 

Finally, the review by J. A. Kulik, 
Bangert-Drowns, and C. C. Kulik 
(1984) helps by viewing the SAT 
within the context of other tests. 
The average effect of coaching for a 
variety of other aptitude tests was 
estimated to be nearly three times 
the average effect for the SAT. This 
is not sxxrprising because, unlike 
questions used in some aptitude 
tests, SAT questions are selected 
partly on the likelihood that they 
will not be susceptible to short-term 
coaching. Because questions that 
use complex formats have been 
shown to be more coachable than 
those using simpler ones (Powers, 
1986), complicated formats are not 
used in the SAT. However, because 
question formats cannot always be 
simplified completely, the College 
Boa^ now also provides a substan- 
tial munber of materials to ensure 
that all prospective test takers have 
ample opportunity to become famil- 
iar with each of the question types 
that is used in the SAT. 

Unresolved Issues. There are still 
many things that are not, and may 
never be, l^own with any absolute 
certainty about the effects of coach- 
ing for the SAT. There are simply 
too many kinds of coaching, too 
many kinds of students, and too 
many difficult-to-control va iables to 
make any comprehensive evaluation 
feasible. Nonetheless, the meta- 
analytic summaries have shed some 
light on several important aspects of 
the effectiveness of coaching for the 
SAT. They also provide a context for 
judging the credibility of new stud- 
ies. As further studies are com- 
pleted, the results should be 
evaluated, to the extent possible, in 
terms of what these summaries have 
revealed. New studies with findings 
that deviate dramatically from the 
existing summaries may require very 
careful examination of the coaching 
methods used and the research de- 
signs on which the results are based. 
In particular, programs that appear 
especially effective but that (a) are 
short-term in duration or (b) have 
not been studied with any scientific 
rigor should be viewed cautious''- 
until they can be verified. 

Recent Studies 

Since the various meta-analytic sum- 
maries were pubUshed, several addi- 



tional studies of coaching for the 
SAT have in fact been completed (or 
had been completed previously but 
were not included in the summa- 
ries). These additional studies have 
examined several coaching pro- 
grams and have used a number of 
alternative methods to estimate ef- 
fects. 

A Common Problem. All of these 
studies have included uncoached 
groups for purposes of comparison, 
but, as in earlier studies, these 
comparisons have varied with re- 
gard to their degree of scientific 
rigor. Random assignment of stu- 
dents to coached and uncoached 
conditions has been attempted infre- 
quently and with mixed success. 
When students have been permitted 
to self-select coaching conditions, as 
is typical, not one of the individual 
studies has completely.controlled for 
the pc^sibly numerous important 
differences between coached and un- 
coached students. Indeed, com- 
pletely adequate controls may not be 
possible. 

Critical between-group differences 
may involve such obviously meaning- 
ful factors as the extent to which 
coached and tmcoached students 
also undertake other forms of test 
preparation that are available to 
them. Some evidence suggests that 
the same factors that lead students 
to seek formal coaching may also 
cause them to use other resources in 
. their preparation for the SAT. For 
example, students who attend coach- 
ing programs appear more likely 
than their uncoached counterparts 
to undertake a review of subject 
matter, to read test preparation 
books, and to attend review sessions 
given by their schools (Powers, 
1981). To the extent that any of 
these other concomitant preparation 
strategies is effective, their use will 
confound the estimates of the effects 
of a given coaching program. 

New Evidence. Although a com- 
pletely unblemished and fully gener- 
alizable coaching study is unlikely, 
there are, nonetheless, several re- 
cent efforts that are highly informa- 
tive, especially when considered 
collectively. Only studies that have 
employed comparison groups are 
discussed here. Some other less 
rigorous efforts have been critiqued 
by Smyth (1990). 
At Deerfield Academy (in Massa- 



chusetts), Fraker (1986-87) studied 
the effects of a program conducted 
by a New York City-based commer- 
cial coaching company known as the 
Princeton Review. The simple SAT 
score gains made from January 1986 
to November 1986 by 19 coached 
students were compared with those 
made by 119 students from the same 
school. Both groups were above 
average with respect to initial SAT 
scores, and uncoached students had 
significantly higher scores initially 
on both measures. Gains made by 
coached students over the 10-month 
period were 16 points greater on the 
math scale but 16 points less on the 
verbal scale than the gains made by 
uncoached students. 

Whitla (1988) compared student- 
reported SAT score increases made 
by coached and uncoached students 
who subsequently enrolled at Har- 
vard University. These students were 
well above average with respect to 
SAT performance, with uncoached 
students somewhat higher than 
coached students initially. Coached 
students reported having attended 
various in-school and commercial 
coathing programs, including those 
offered by Stsmley Kaplan; Inc., and 
the Princeton Review. The score 
gains of coached students were 
greater than those of uncoached 
students by 11 points on the verbal 
scale and 16 points on the math 
scale. There were no significant 
differences among test preparation 
coaching enterprises with regard to 
average score gains. 

Zuman (1988) studied the effects 
of the Princeton Review program for 
two small groups of New York City 
eleventh graders, one consisting of 
low-income minority students. One 
group was well above average on 
both SAT score scales, and the other 
was below average on both. Zuman 
attempted to constitute equivalent 
comparisons for each of these groups 
by recruiting students and randondy 
assigning them to early and later 
coadiing. The intention was to com- 
pare the scores obtained by one 
group before coaching with those 
obtained by an equivalent group 
after coaching. This attempt was 
only partially successful, in that the 
comparison samples within each 
group were quite similar at the start 
of the study with respect to SAT 
performance.'' However, because of 



Summer 1993 



27 



Table 1 

Estimated SAT Coaching Effects From Recent Studies 



Estimated effect Number of students 



btuOy 


verbal 


Math 


Coached 


Uncoached 


Fraker (1 986-87) 


-16 


16 


19 


119 


Smyth (1990) 


6 


18 


501 


631 


Snedecor (1989) 


0 


15 


264 


271 


VVhitla(1988) 


11 


16 


341 


1217 


Zuman (1988) 










Group 1 


52 


58 


21 


34 


Minority students 


0 


57 


16 


17 


Median 


3 


17 





significant attrition the randomized 
design was not completely main- 
tained. Furthermore, the results 
suggest that students in the compar- 
ison groups may have lacked motiva- ■ 
tion to do their best on the facsimile 
SAT, because their scores decreased 
slightly upon retesting. 

Instead of simply comparing gain 
scores, Zuman employed regression 
analysis, using a variety of informa- 
tion about students' background 
characteristics to provide more sta- 
tistically precise estimates. Given 
the study limitations mentioned 
above, the estimated effects were 52 
verbal points and 58 math points for 
the first group, and 0 verbal and 57 
math points for the low-income 
group. 

For students in 10 private schools 
in the Philadelphia, PA, area, Snede- 
cor (1989) compared the score gains 
made by 271 uncoached students 
with those made by 264 students 
who had attended 1 of 10 or more 
commercial coaching programs. Both 
groups had above-average SAT 
scores, with coached students report- 
ing scores that were slightly lower 
initially than those reported by un- 
coached students. Average gains 
made by coached students excwded 
those made by uncoached students 
by 15 points on the math portion of 
the test. Each group exhibited equal 
average gains on the verbal s<^e. 
The author reported that although 
some programs performed better 
tl'Mi others, none showed dramatic 
results. 

In a similar effort, Smyth (1989) 
examined the scores of 200 coached 
and 238 uncoached students at eight 
private college-preparatory schools 
in suburban Baltimore, MD. Stu- 
dents had above-average PSAT 
scores; coached students had PSAT 
scores that were somewhat lower on 
average than those of students who 
were not coached. Score improve- 
ments were defined as the difference 
between PSAT scores, all of which 
were obtained prior to any coaching, 
and best score on any of three 
subsequent official SAT administra- 
tions. By these standards, coached 
students gained 6 points more on 
SAT-V and 32 more points on SAT-M 
than did uncoached students. Stu- 
dents were coached by five or more 
different commercial firms. Analyses 
did not reveal any significant differ- 



ences among the effect estimates for 
the major coaching enterprises. 

In a subsequent academic year, 
Smyth (1990) repeated this effort' 
with an additional 300 coached and 
nearly 400 uncoached students who 
attended 14 independent ^econdjuy 
schools in Maryland and New Jer- 
sey. (Five of the schools had partici- 
pated in the earlier study also.) 
Again, students in this study had 
above-average scores before being 
coached, and coached students had 
slightly lower scores initially than 
did their uncoached counterparts. 
Combining data with those from the . 
earlier study, Smyth (1990) found 
that coached students gained 9 more 
verbal and 24 more math points 
than did uncoached students. When 
these differences were adjusted via 
analysis of covariance (ANCOVA) 
for between-group diflferences in 
PSAT scores and number of times 
the SAT had been taken, the more 
precise effect estimates were 6 and 
18 points on the verbal and math 
scales of the SAT. 

The results of all of these most 
recent studies are summarized in 
Table 1. It should be reiterated that, 
instead of simple test-score gains, 
the numbers shown are estimates of 
the effects of coaching above and 
beyond any effects resulting from 
growth, practice, and other factors 
that affect the scores of both coached 
and uncoached students. With some 
exceptions, these additional studies 
are generally consistent with the 
meta-analytic summaries. Coaching 
programs, even the most highly 
publicized ones, apoear to have on 
average a small effect on SAT-V 
scores and a somewhat larger, 
though still modest, effect on SAT-M 



scores. The median effects for the 
studies listed in Table 1 are 3 points 
for SAT-V scores and 17 points for 
SAT-M scores. Ther^e estimates cor- 
respond closely with those computed 
by Smyth (1990) in the most recent, 
the largest (in terms of the number 
of coadied students), and arguably 
the best controlled study of those 
reported recently. In addition, these 
median values are also quite close to 
those given by Becker (1990) of 9 
and 19 points, even though only one 
of these more recent studies (Zu- 
man's) was included in her esti- 
mates. 

Some of the recent studies have 
reported results separately by pro- 
gram, but have not revealed any 
dramatic differences among major 
coaching enterprises with regard to 
effectiveness. It seems likely, then, 
that the major differences among 
individual study estimates result 
primarily from dissimilarities in the 
design and execution of the studies, 
and specifically in how comparisons 
were established, rather than from 
differential program effectiveness. 

Nonetheless, the issue of espe- 
cially effective programs may de- 
serve more study. Two nationally- 
franchised commercial firms — the 
New York City-based Princeton Re- 
view and Stanley Kaplan, Inc.— 
offer programs that are generally 
longer in duration (40 hours or more 
of classroom coaching) and more 
expensive (currently up to $700 for 
the Princeton Review) than most 
other programs. These firms also 
appear to enjoy the largest share of 
the coaching market. There may be 
special interest, therefore, in the 
effectiveness of these particular pro- 
grams. 



O 28 

ERIC 



Educational Measurement: Issues and Practice 



Six studies have provided effect 
estimates for the Stanley Kaplan 
program, and eight have done so for 
the Princeton Review. These esti- 
mates, summarized in Table 2, are 
generally consistent with the esti- 
mates based on all studies consid- 
ered earlier in this article. The 
median effect estimates for the 
Kaplan and Princeton Review pro- 
grams suggest that these progpuns 
are hardly (if at all) more effective in 
improving SAT scores than are 
coaching programs generally. More- 
over, there is not much basis in 
these estimates to suggest that ei- 
ther of these programs is any more 
effective than the other (although 
the median estimate for SAT-V ap- 
pears somewhat higher for Kaplan 
than for the Princeton Review). A 
final observation is that the esti- 
mates for the Kaplan program are 
less variable than those for the 
Princeton Review. Whether this is a 
function of the size or design of the 
studies, or of the degree to which 
these coaching programs are con- 
ducted consistently from site to site, 
cannot be readily ascertained from 
the data at hand. 

The bottom-line result of the 
quantitative summaries and of the 
most recently completed research 
then is as follows: Reasonably good 
estimates are now available f^out 
the effects of coaching for the SAT, 
including that provided by for-profit 
firms. These estimates are most 
assuredly better than the claims 
currently being made by some com- 
mercial coaching enterprises and the 
word-of-mouth accounts of individ- 
ual test takers. 

Implications 

The decision to seek coaching for the 
SAT is still one that students and 
their parents must make individu- 
ally, but this deliberation ought to 
be based on the best evidence avail- 
able. Potential consumers should 
consider not only the likely benefits 
but also the expected costs. Costs 
may entail significant financial out- 
lays, but equally important, lost 
opportunities. A legitimate question 
for test takers to ask then is, "What 
could I do, in the 40 or more hours 
that I might spend at a coaching 
school, to improve not only my 
chances of being accepted at the 
college of my choice but also my 
chances of succeeding at that college 

Summer 1993 



Table 2 

SAT Coaching Effect Estimates for Two Commercial 
0)aching Programs 





Estimated effect 


Number of 


Source 


Verbal 


Math 


students coached 




Stanley Kaplan 




Wing (1987) as 








reported by Smyth (1 990) 


7' 


24a 


72 


bnedecor (1 989) 


12 


26 


22 


besnowitz et al. ( i vol) 


28(14") 


24(10") 




omytn (i yyu) 


19 


26 




Wnitla (1988) 


24* 


18« 


— -red 

= 75° 


Median 


19(14') 


24(24'') 


= 75 




Princeton Review 




Fraker (1986-87) 


-16 


16 


19 


Wing (1987) as 








reported by Smyth (1 990) 


-16 


31 


61 


Snedecor (1989) 


-2 


4 


48 


Zuman (1988) minority students 


0 


57 


17 


Whitia (1988) 


8« 


13* 


= 75^' 


Smyth (1990) 


12 


26 


66 


Zuman (1988) Group 1 


52 


58 


34 


Median 


0 


26 


48 



^These estimates are based on Smyth's report of a study by Cliff Wing (1987) at Wal<e 
Forest University. "Sesnowitz, Bernhardt, & Knain (1982) also reported estimates of 14 
points for SAT-V and 1 0 points for SAT-M. These smaller estimates were based only on 
adjustments for differences between coached and uncoached students on a number of 
demographic and personal charaaertstics (e.g., rank in high school) that were related to 
test performance, not on earlier test scores. This adjustment was used because students 
scored lower on earlier tests than was expected from their demographic and persona! 
chararteristics. <The total number of coached students involved in the study was 514, 
diwded among two schools. How these students were apportioned between the two 
coaching schools could not be determined from the report. ^The exact number from 
each of these schools could not be readily determined. *Based on data collected by 
Whitia, but not reported in Whitia (1 988). 'Median when the smaller estimate from 
Sesnowitzet al. (1982) is used. 



once enrolled? Would I, for example, 
be better served by concentrating on 
developing my subject-matter knowl- 
edge than my test-taking skills?" 
TUs question probably has more 
than one correct answer, and will 
undoubtedly vary from student to 
student. 

With respect to benefits, the ef- 
fects of coaching may be less than 
many students suppose. Although 
coaching schools promise laige score 
increases, there are no real guaran- 
tees in terms of the actual effects of 
coaching. Test takers should be 
made aware of this critical distinc- 
tion. Assuming that coaching iin- 
proves scores by about 10 points on 
the verbal and about 20 points on 
the math portion of the SAT, a 
coached student can expect to im- 

7 



prove more than his or her un- 
coached counterpart about 6 times 
in 10 on SAT-V and about 6-7 times 
in 10 on SAT-M.* On the other hand, 
about 4 times in 10 for SAT-V and 
3-4 times in 10 for SAT-M, the 
typical uncoached student can be 
expected to exhibit larger score in- 
creases than one who has been 
coached. This is a better statement 
of the real "guarantee," which seems 
more in line with students' actual 
experiences. When th^ have been 
asked to give their opinions, less 
than a majority of coached students 
have said they were satisfied with 
their score changes — for example, 
24% of those polled by Snedecor 
(1989) and 43% of those surveyed by 
Whitia (1988). 
Messick (1982) suggested that 

29 



ERIC 



improvements in percentile ranking 
might serve as a gauge of the 
practical impact of coaching. Using 
the most recent data for college- 
bound seniors (College Board, 1991) 
as a basis, it appears that improve- 
ments of 10 points on SAT-V and 20 
points on SAT-M will do relatively 
little to improve a "typical" (SAT- 
V = 420,'SAT-M = 470; test taker's 
standing. Improvements of 10 and 
20 points will push this test taker's 
ranking ahead of only a small propor- 
tion of additional test takers — from 
the 48th to 53rd percentile rank on 
SAT-V and from the 48th to the 
54th on SAT-M. At higher as well as 
lower score levels, these rankings 
would change even less. For in- 
stance, improving an SAT-V score 
from 600 to 610 would raise one's 
standing from the 93rd to the 94th 
percentile rank, and going from 680 
to 700 on SAT-M corresponds to an 
increase from the 94th to the 96th 
percentile rank. At the lower score 
levels, increasing an SAT-V score 
from 250 to 260 is equivalent to 
going from the fifth to the sixth 
percentile rank; increasing a 290 
SAT-M score to 310 will improve the 
percentile rank from 5 to 8. These 
figures are, of course, for all college- 
bound senior SAT takers. Because 
applicants to individual colleges may 
as a group have less variable scores 
than do test takers in general, score 
increases may have a somewhat 
greater impact on relative standing 
within applicant pools. 

In conclusion, it is hoped that this 
discussion of summaries, enhanced 
by a description of recent, individual 
studies of coaching for the SAT, will 
in some small way be useful to those 
who counsel students about prepar- 
ing for the SAT. Test takers who 
. contemplate undertaking coaching 
should be helped to critique the 
claims made by major commercial 
companies and to ask for explana- 
tions of any discrepancies between 
the assertions made by these compa- 
nies and the conclusions of the 
several scholarly summaries dis- 
cussed here. 

Notes 

This artide is a considerably expanded 
version of a "Brief Overview" paper that 
was supported by funding from the Joint 
Staff Research and Development Com- 
mittee of ETS and the (College Board. 
The views are the author's and do not 



30 



necessarily reflect those of either ETS or 
the College Board. Special thanks go to 
Ami Jungeblut, Michael Zieky, three 
anonymous reviewers for thoughtful 
comments, and Ruth Yoder for her 
careful handling of the production of this 
manuscript. 

'Smyth (1990) provides an informa- 
tive critique of the advertising practices 
of several commercial companies. He 
give's ah interesting account of the ways 
in which these companies use the avail- 
able scientific evidence, to make their 
claims. 

^i., is interesting to note that this 
study is one of the very few such studies 
that has attempted to determine the 
locus of any coaching effect. The investi- 
gators noted that after being coached, 
test takers were able to reach more 
questions on both the verbal and ^e 
math tests. Correctly answeringareason- 
able fraction of these items may have' 
accounted for a substantial portion of 
the test score improvements that were 
observed in the study. 

^Probably the most appropriate and 
informative comparison is between stu- 
dents who receive extensive coaching 
and those who use less costly .and less 
time-consuming resources, sudi as the 
free test familiarization provided by the 
College Board to all SAT takers. 

^ActuaUy, the study was based on 
self-reported PSAT scores, scores from a 
special administration of a retired form 
of the SAT (for which scores were not 
reported to coU^), and students' ac- 
tual SAT scores. 

^hese estimates are based on a 
method suggested by McGaw and Wong 
(1992). Estimates used for standard 
deviations of test score gains (45 for 
SAT-V and 62 for SAT-M) are those 
reported by Donlon (1984) for a number 
of SAT testing years, and thQr corre- 
spond to the standard errors of differ- 
ences reported most recently by the 
CoU^ Board (1991). 

References 

Anastasi, A. (1981). Coaching, test so- 
phistication, and developed ^ilities. 
American Psychologist, 36(10), 1086- 
1093. 

Becker, B. J. (1990). Coaching for the 
Scholastic ^titude Test: Further 
thesis and «^)praisal. Review of Educa- 
tional Researeh, 60, 373-417, 

Bond, L. (1989). The effects of special 
preparation on measures of scholastic 
ability. In R. L. Linn (Ed.), Educa- 
tional measurement. New York: Amer- 
ican Council on Education and 
Macmillan Publishing (Company. 

Cole, N. (1982). The implications of 
coachmg for ability testing. In A. K. 
Wigdor and W. R. Gardner (Eds.), 
Ability testing: Uses, consequences, 
andcofttrovtrsies. Part II: Documenta- 



tion section. Washington, DC: Na- 
tional Academy Press. 
College Board !199b. APT guide 
1991-92 for high schools and colleges. 
New York: College Board Publica- 
tions. 

DerSimonian, R., & Laird, X. M. '198-3). 
Evaluating the effect of coaching on 
SAT scores: A meta-analysis. Harvard 
Educational Review, 53, 1-15. 

Donlon, T. F. (Ed.j (1984,'. The College 
Board''' technical handbook for the 
Scholastic Aptitude Test '- and Achieve- 
ment Tests. New York: College En- 
trance Examination Board. 

Fraker, G. A. (Winter 1986-87j. The 
Princeton Review reviewed. The News- 
letter. Deerfield, MA: Deerfield Acad- 
emy. 

Glass, G. v., McGaw, B., & Smith, M. L. 
(1981 J. Meta-analysis in social re- 
search. Beverly Hills, CA: Sage Publi- 
cations. 

Hedges, L. V., & Olkin, 1. (1985). fitatisti- 
cal methods for meta-analysis. New 
York: Academic Press, Inc. 

Hunter, J. E., Schmidt, F. L., & Jackson, 
G. B. mS2). Meta-analysis: Cumulat- 
ing research findings across studies. 
Beverly Hills, CA: Sage Publications. 

Johnson, S. T., Asbury, C. A., Wallace, 
M. B., Robinson, S., & Vau^n, J. 
, (1985, April). The effectiveness of a 
program to increase Scholastic Apti- 
tude Test scores of Black students in 
three cities. Paper presented at the 
Annual Meeting of the National Coun- 
cil on Measurement in Education, 
Chicago. 

Kulik, J. A., Bangert-Drowns, R. L., & 
Kulik, C. C. (1984). Effectiveness of 
coaching for aptitude tests. Psycholog- 
ical Bulletin, 95, 179-188. 

Li^t, R. J., & Pillemer, D. B. (1984). 
Summing up: The science of reviewing 
research. Cambridge: Harvard Univer- 
sity Press. 

Mc(3aw, K 0., & Wong, S. P. (1992). A 
common language effea size statistic. 
Psychological Bulletin, 111, 361-365. 

Messick, S. (1982). Issues of effective- 
ness and equity in the coaching contro- 
vert: Implintions for educational 
and testing pnwtice. Educational Psy- 
chologUt. 17,67-91. 

Messick, S., & Jungeblut, A (1981). 
Time and method in coaching for the 
SAT. Psychological Bulletin, 89, 191- 
216. 

Powers, D. E. (1988). Preparing for the 
SAT: A survey of programs and re- 
sources (College Board Rep. No. 88-7 
and ETS Res. Rep. No. 88-40). New 
York: College Entrance Examination 
Board. 

Powers, D. E. (1986). Relations of test 
item characteristics to test prepara- 
tion/test practice effects: A quantita- 
Continued on page 39 



8 

BEST COPY AVAUABIE 



Educational Measurement: Issues and Practice 



Coaching for the SAT 

Continued from page 30 

tive summary. Psychotoqical Bulletin, 
100, 67-77. 
Powers, D. E. (1981). Students' use of 
and reactions to alternative methods 
of preparing for the SAT. Measure- 
ment and Evaluation in Guidance, 14, 
118-126. 

Rosenthal, R. (1984). Meta-analytic pro- 
cedures for social research. Beverly 
Hills, CA: Sage Publications. 

Sesnowitz, M., Bernhardt, K. L., & 
Knain, D. M. (1982). An analysis of 
the impact of commercial test prepara- 
tion courses on SAT scores. American 
Educational Research Journal, 19, 
429-441. 

Slack, W. v., & Porter, D. (1980). The 
Scholastic Aptitude Test: A critical 
appraisal. Harvard Education Review, 
50, 154-175. 

Smyth, F. L. ( 1990). SAT coaching: What 
really happens to scores and how we 
are led to expect more. The Journal of 
College Admissions, 129, 7-16. 

Smyth, F. L. (1989). Commercial coach- 



ing and SAT scores: The effects on 
college preparatory students in pri- 
vate schools. The Journal of College 
Admissions, 123, 2-7. 

Snedecor, P. J. (1989). Coaching. Does it 
pay — revisited. The Journal of Col- 
lege Admissions, 125, 15-18. 

Whitla, D. K. (1988). Coaching Does it 
pay? Not for Harvard students. The 
College Board Review, 148, 32-35. 

Wilder, G., Casserly, P. L, & Burton, 
N. W. (1988). Young SAT-takers: Two 
surveys (College Board Rep. No. 88- 
1). New York: College Entrance Exam- 
ination Board. 

Wing, C. W., Jr. ( 1987, ;^ril). Some field 
observations of the impact of test 
preparatory programs on high school 
students' Scholastic Aptitude Test 
scores. Winston-Salem, NC: A report 
to the Awards Committee for Educa- 
tion and Wake Forest University. 

Zuman, J. P. (1988). The effectiveness of 
special preparation for the SAT: An 
evaluation of a commerdal coaching 
school. Dissertation Abstracts Interna- 
tional, 48, 1749A-1750A. (University 
Microfilms No. DA 8722714) 



Summer 1993 



